An Assessment of Feature Relevance in Predicting Protein Function from Sequence

نویسندگان

  • Ali Al-Shahib
  • Chao He
  • Aik Choon Tan
  • Mark A. Girolami
  • David R. Gilbert
چکیده

Improving the performance of protein function prediction is the ultimate goal for a bioinformatician working in functional genomics. The classical prediction approach is to employ pairwise sequence alignments. However this method often faces difficulties when no statistically significant homologous sequences are identified. An alternative way is to predict protein function from sequence-derived features using machine learning. In this case the choice of possible features which can be derived from the sequence is of vital importance to ensure adequate discrimination to predict function. In this paper we have shown that carefully assessing the discriminative value of derived features by performing feature selection improves the performance of the prediction classifiers by eliminating irrelevant and redundant features. The subset selected from available features has also shown to be biologically meaningful as they correspond to features that have commonly been employed to assess biological function.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

firestar—prediction of functionally important residues using structural templates and alignment reliability

UNLABELLED Here we present firestar, an expert system for predicting ligand-binding residues in protein structures. The server provides a method for extrapolating from the large inventory of functionally important residues organized in the FireDB database and adds information about the local conservation of potential-binding residues. The interface allows users to make queries by protein sequen...

متن کامل

National strategy on endocrine disruptors.

Here we present firestar, an expert system for predicting ligand-binding residues in protein structures. The server provides a method for extrapolating from the large inventory of functionally important residues organized in the FireDB database and adds information about the local conservation of potential-binding residues. The interface allows users to make queries by protein sequence or struc...

متن کامل

Protein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches

DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...

متن کامل

CLONING AND EXPRESSION OF LEISHMANOLYSIN GENE FROM LEISHMANIA MAJOR IN PRIMATE CELL LINES

Leishmanolysin is a worldwide disease that is caused by different species of the genus Leishmania. Leishmanolysin, One of the genes expressed by Leishmania, appears to be an ideal candidate for genetic vaccination. In this study, a full length sequence, which encodes Leishmanolysin functionally critical regions (amino acids 100-579), was cloned from a Leishmania strain endemic to Iran. Analysis...

متن کامل

Feature selection and the class imbalance problem in predicting protein function from sequence.

When the standard approach to predict protein function by sequence homology fails, other alternative methods can be used that require only the amino acid sequence for predicting function. One such approach uses machine learning to predict protein function directly from amino acid sequence features. However, there are two issues to consider before successful functional prediction can take place:...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004